Using Transfer Learning to Assist Exploratory Corpus Annotation

نویسندگان

  • Paul Felt
  • Eric K. Ringger
  • Kevin D. Seppi
  • Kristian Heal
چکیده

We describe an under-studied problem in language resource management: that of providing automatic assistance to annotators working in exploratory settings. When no satisfactory tagset already exists, such as in under-resourced or undocumented languages, it must be developed iteratively while annotating data. This process naturally gives rise to a sequence of datasets, each annotated differently. We argue that this problem is best regarded as a transfer learning problem with multiple source tasks. Using part-of-speech tagging data with simulated exploratory tagsets, we demonstrate that even simple transfer learning techniques can significantly improve the quality of pre-annotations in an exploratory annotation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Faster Annotation Interfaces for Learning to Filter in Information Extraction and Search

This work explores the design of an annotation interface for a document filtering system based on supervised and semisupervised machine learning, focusing on usability improvements to the user interface to improve the efficiency of annotation without loss of precision, recall, and accuracy. Our objective is to create an automated pipeline for information extraction (IE) and exploratory search f...

متن کامل

Solving the AL Chicken-and-Egg Corpus and Model Problem: Model-free Active Learning for Phenomena-driven Corpus Construction

Active learning (AL) is often used in corpus construction (CC) for selecting “informative” documents for annotation. This is ideal for focusing annotation efforts when all documents cannot be annotated, but has the limitation that it is carried out in a closed-loop, selecting points that will improve an existing model. For phenomena-driven and exploratory CC, the lack of existing-models and spe...

متن کامل

Treebank Development with Deductive and Abductive Explanation-based Learning: Exploratory Experiments

In pace with the success of corpus-based approaches to theoretical and computational linguistics, the collocation of corpora has evolved into a research activity in its own. As the currently available corpora either lack annotation depth or closure, more data will be annotated in the future, preferably with minimal human intervention. This paper tries to approach the problem of treebank develop...

متن کامل

European Association for Computer Assisted Language Learning THE EUROCALL REVIEW

BACKBONE is a European LLP/Languages project (1) (Jan 2009 Feb 2011), whose overall objective is to provide foreign language teachers in CLIL settings with innovative language learning solutions. To achieve this goal, pedagogic corpora of spoken interviews are combined with corpus-related e-learning activities in blended learning scenarios. The seven BACKBONE corpora contain video interviews in...

متن کامل

Semi-Automatic Sign Language Corpora Annotation using Lexical Representations of Signs

Nowadays many researches focus on the automatic recognition of sign language. High recognition rates are achieved using lot of training data. This data is, generally, collected by manual annotating SL video corpus. However this is time consuming and the results depend on the annotators knowledge. In this work we intend to assist the annotation in terms of glosses which consist on writing down t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014